Tandem representations of spectral envelope and modulation frequency features for ASR
نویسندگان
چکیده
We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are combined at the phoneme posterior level. Tandem representations derived from these phoneme posteriors are used along with HMM based ASR systems for both small and large vocabulary continuous speech recognition (LVCSR) tasks. For a small vocabulary continuous digit task on the OGI Digits database, the proposed features reduce the word error rate (WER) by 13 % relative to other feature extraction techniques. We obtain a relative reduction of about 14 % in WER for an LVCSR task using the NIST RT05 evaluation data. For phoneme recognition tasks on the TIMIT database these features provide a relative improvement of 13% compared to other techniques.
منابع مشابه
A study on temporal features derived by analytic signal
Traditional feature extraction methods for automatic speech recognition (ASR), such as MFCC (Mel-frequency cepstral coefficients) and PLP (perceptual linear prediction) [6], are extracted from short-term spectral envelopes and can be used to realize promising ASR systems. On the other hand, features extracted by TRAPs-like classifiers [2] are based on long-term envelopes of narrow-band signals....
متن کاملSpeech recognition from spectral dynamics
Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the histor...
متن کاملSome Emerging Concepts in Speech Recognition
The paper presents a work-in-progress on several emerging concepts in Automatic Speech Recognition (ASR), that are being currently studied at IDIAP. This work can be roughly categorized into three categories: 1) data-guided features, 2) features based on modulation spectrum of speech, 3) minimum entropy based multi-stream information fusion. 1. DATA-GUIDED FEATURES Summary: Optimal set of featu...
متن کاملTandem processing of fepstrum features
In our previous work [1, 2], we have introduced Fepstrum an improved modulation spectrum estimation technique that overcomes certain theoretical as well as practical shortcomings in the previously published modulation spectrum related techniques[7, 8, 9]. In this paper, we provide further extensive ASR results using the Tandem processed Fepstrum features over the TIMIT corpus. The results are c...
متن کاملOptimization and evaluation of Gabor feature sets for ASR
In order to enhance automatic speech recognition performance in adverse conditions, Gabor features motivated by physiological measurements in the primary auditory cortex were optimized and evaluated. In the Aurora 2 experimental setup such localized, spectro-temporal filters combined with a Tandem system yield robust performance with a feature set size of 30. Improved results can be obtained wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009